[*********************100%***********************] 29 of 29 completed
The low and high threshold for regime classification is adjustable.A value of -0.5 and 0.5 stand for -0.5% and 0.5% change per step.
We use multiple cluster methods to cross validate how our deterministic trend method
Counter({1: 1158, 0: 1771, 2: 2362, 5: 155, 3: 101, 4: 8})
[Counter({0: 690, 5: 154, 2: 92, 4: 8}),
Counter({2: 687, 0: 1071, 5: 1}),
Counter({2: 1393, 1: 411, 0: 9}),
Counter({1: 747, 2: 190, 3: 101, 0: 1})]
The major cluster of regime 0 is 0. 0.73 of this regime is in this cluster. 0.39 of this laebl is in this regime The major cluster of regime 1 is 0. 0.61 of this regime is in this cluster. 0.6 of this laebl is in this regime The major cluster of regime 2 is 2. 0.77 of this regime is in this cluster. 0.59 of this laebl is in this regime The major cluster of regime 3 is 1. 0.72 of this regime is in this cluster. 0.65 of this laebl is in this regime
KNN is less convincing in this example, but you can try it on your dataset
We regard each time-serie as an vector of a single dim and stack all vetors as multiple dims
Interpolate the normalizaed adjcp/pct_return
TSNE of linear model
TSNE of DWT
TSNE of KNN
In use cases, we usually set constrain on the minimum length of a regime series, this is the post-processing process of the regime series to meet minimum length
| 0 | |
|---|---|
| count | 2220.000000 |
| mean | 0.079565 |
| std | 0.320038 |
| min | -3.095165 |
| 25% | -0.092169 |
| 50% | 0.085719 |
| 75% | 0.257236 |
| max | 2.339544 |
We cluster the stock into groups based on their adjclose
[['CSCO', 'INTC', 'KO', 'MRK', 'VZ', 'WBA'], ['GS', 'HD', 'UNH'], ['AMGN', 'CAT', 'CRM', 'HON', 'MCD', 'MMM', 'MSFT', 'V'], ['IBM'], ['BA'], ['AAPL', 'AXP', 'CVX', 'DIS', 'JNJ', 'JPM', 'NKE', 'PG', 'TRV', 'WMT']]
This is cluster 0
This is cluster 1
This is cluster 2
This is cluster 3
This is cluster 4
This is cluster 5